On the use of pitch normalization for improving children's speech recognition

نویسندگان

  • Rohit Sinha
  • Shweta Ghai
چکیده

In this work, we have studied the effect of pitch variations across the speech signals in context of automatic speech recognition. Our initial study done on vowel data indicates that on account of insufficient smoothing of pitch harmonics by the filterbank, particularly for high pitch signals, the variances of mel frequency cepstral coefficients (MFCC) feature significantly increase with increase in the pitch of the speech signals. Further to reduce the variance of MFCC feature due to varying pitch among speakers, a maximum likelihood based explicit pitch normalization method has been explored. On connected digit recognition task, with pitch normalization a relative improvement of 15% is obtained over baseline for children’s speech (higher pitch) on adults’ speech (lower pitch) trained models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

Enhancing children's speech recognition under mismatched condition by explicit acoustic normalization

Most commonly used model adaptation techniques employ linear/affine transformation on models/features to address the gross acoustic mismatch between the adults’ and the children’s speech data. Since all sources of acoustic mismatch may not be appropriately modeled by just linear transformation, in this work, the efficacy of our recently proposed explicit acoustic (pitch and speaking rate) norma...

متن کامل

Statistical Variation Analysis of Formant and Pitch Frequencies in Anger and Happiness Emotional Sentences in Farsi Language

Setup of an emotion recognition or emotional speech recognition system is directly related to how emotion changes the speech features. In this research, the influence of emotion on the anger and happiness was evaluated and the results were compared with the neutral speech. So the pitch frequency and the first three formant frequencies were used. The experimental results showed that there are lo...

متن کامل

Exploring the Effect of Differences in the Acoustic Correlates of Adults' and Children's Speech in the Context of Automatic Speech Recognition

This work explores the effect of mismatches between adults’ and children’s speech due to differences in various acoustic correlates on the automatic speech recognition performance under mismatched conditions. The different correlates studied in this work include the pitch, the speaking rate, the glottal parameters (open quotient, return quotient, and speech quotient), and the formant frequencie...

متن کامل

A Study on the Effect of Pitch on LPCC and PLPC Features for Children's ASR in Comparison to MFCC

In this work, following our previous studies, we study and quantify the effect of pitch on LPCC and PLPC features and explore their efficacy for children’s mismatched ASR in comparison to MFCC. Our analysis shows that, unlike MFCC, LPCC feature has no major influence of pitch variations. On the other hand, similar to MFCC, though PLPC is also found to be significantly effected by pitch variatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009